Tidying up the Basement: A Tale of Large-Scale Parsing on National eInfrastructure
نویسنده
چکیده
Until about six years ago, our research group used non-trivial amounts of project funds and researcher time on maintaining a dedicated server farm in the basement of our department. Rack space and cooling (just as much as funds and time) were in short supply, and we never quite got around to implementing automated load balancing across compute nodes, tuning the Linux kernel and filesystem for optimum performance, or connecting to the uninterruptible power supply. When pointed to the Norwegian National High-Performance Computing Initiative, we were intially doubtful that Natural Language Processing should be among their target user groups. Also, we were a tad hesitant to give up control of our own equipment and of course worried we would miss what we thought were our fancy toys. Today, any member of the group can access thousands of cpus simultaneously, we have about five terabytes of project data on-line, and our research has scaled to dataset sizes and turn-around times that would be just inconceivable on group-local hardware—at no charge to our project funds and no administrator responsibilities. For example, ‘deep’ semantic parsing of the about 900 million words of the English Wikipedia we can typically complete in less than one day (while expending what would be about eight sequential years of computation). Or, when searching for the best-performing features and hyper-parameters in a machine learning problem, we can explore a large ‘grid’ of possible configurations in parallel, without much need for a staged, partly manual, ‘coarseto-fine’ search strategy. Access to the very large-scale Norwegian National eInfrastructure and its high-quality technical support have enabled a comparatively computation-heavy research profile of our group and has thus contributed to its international competitiveness. In this presentation, I will review some of our experiences in establishing a dialogue with the HPC crowd and propose HPC for the Masses as a candidate vision in the on-going development trend towards more and more large-scale computational sciences.
منابع مشابه
An Architectural Tale of the Two Cities
A comparative study of the corresponding styles of Western and Iranian modern architecture has hardly ever been carried out in detail. This paper aims to sketch out an outline for such an investigation and to present a summary of empirical evidence accompanied by field observations to elaborate the ongoing trend of relationship between architectural styles in Iran and that of the West. This is ...
متن کاملEcologies of e-Infrastructures
Vidar Hepsø BI Norwegian School of Management We present and discuss a historical reconstruction of the development of a Microsoft SharePoint eInfrastructure in NorthOil (2003 – 2008). The eInfrastructure was to support strategically emphasized work processes and open up a richer context of decision-making around production optimization. Specifically, the new eInfrastructure was to make it more...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملConstructive Dynamisms of Large-Scale Urban Projects by the Space Political Economy Approach; a Case Study of Mashhad Metropolis
Aims: The development of large-scale construction projects has transformed the shape of cities towards specific objectives and based on economic and political perspectives that dominate policy-making and planning in cities. The purpose of the research was to study and analyze the spatiality of Mashhad construction mega-projects and to explain the constructive forces and dynamisms of these proje...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013